Phrase Pair Classification for Identifying Subtopics

نویسندگان

  • Sujatha Das Gollapalli
  • Prasenjit Mitra
  • C. Lee Giles
چکیده

Automatic identification of subtopics for a given topic is desirable because it eliminates the need for manual construction of domain-specific topic hierarchies. In this paper, we design features based on corpus statistics to design a classifier for identifying the (subtopic, topic) links between phrase pairs. We combine these features along with the commonly-used syntactic patterns to classify phrase pairs from datasets in Computer Science and WordNet. In addition, we show a novel application of our is-a-subtopic-of classifier for query expansion in Expert Search and compare it with pseudo-relevance feedback.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Embedded Question Reuse in Question Answering

The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...

متن کامل

A Bipartite Graph-Based Ranking Approach to Query Subtopics Diversification Focused on Word Embedding Features

Web search queries are usually vague, ambiguous, or tend to have multiple intents. Users have different search intents while issuing the same query. Understanding the intents through mining subtopics underlying a query has gained much interest in recent years. Query suggestions provided by search engines hold some intents of the original query, however, suggested queries are often noisy and con...

متن کامل

Why and How to Pay Different Attention to Phrase Alignments of Different Intensities

This work studies comparatively two typical sentence pair classification tasks: textual entailment (TE) and answer selection (AS), observing that phrase alignments of different intensities contribute differently in these tasks. We address the problems of identifying phrase alignments of flexible granularity and pooling alignments of different intensities for these tasks. Examples for flexible g...

متن کامل

A Machine Learning Approach for QA and Novelty Tracks: NTT System Description

In one sense, the goals of QA and Novelty tasks are the same: extracting small document parts which are relevant to users’ queries. Additionally, the unit of extraction is almost always fixed in both tasks. For QA, an answer is a noun phrase in most cases, and for Novelty, a sentence is recognized as the basic information unit. This observation leads us to the following unified approach to both...

متن کامل

Query Subtopic Mining via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis

Ambiguous and multifaceted queries widely exist in academic and commercial search engines. Identifying the popular subtopics of queries is an important issue for search engines. In this paper, we propose a novel method to discover the popular subtopics for a given query. Our method first constructs a search behavior tripartite graph based on the search log data. Then, we utilize a subtractive i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012